Thread: c code: split string || multiple/optional delimiters || memcpy - optimize

  1. #1
    Registered User
    Join Date
    Oct 2013
    Posts
    87

    c code: split string || multiple/optional delimiters || memcpy - optimize

    Hi devs,

    I am writing a code to parse strings with genetic information. Data are clunky with randomness.
    For example,
    GENE1;GENE2
    GENE21,GENE22
    GENE12
    I need name of each gene separately: GENE1, GENE2 and such. It is possible that there is no comma or ; in the string for instance, in GENE12.

    strtok isn't helpful in this kind of situation that is why I worked on this piece of code.

    I wrote code below, it works, but I think it is error prone.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    
    //  gcc -Wpedantic -Wextra -Wall hello.c -o print
    int main(int argc, char *argv[])
    {
    
    
        //char tab[] = "hello;morgan;chase;capital,house";
        char tab[] = "hello,discover"; //split this string
    //    char tab[] = "hello"; //split this string
        printf("we have value as tab %s\n", tab);
    
    
        int itr; //use to iterate
        int str_len = strlen(tab); //use for length variable of string
    
    
        int temp_itr = 0; //use to keep value of itr when needed
        char temp_gene[200]; //store value in this of sub-string
    
    
        for (itr = 0; itr < str_len; itr++)
        {
            if (tab[itr] == ',' || tab[itr] == ';')
            {
                if (temp_itr == 0)
                {
                    //if no comma or ; has been found 
                    memcpy(temp_gene, tab + temp_itr, itr - temp_itr);
                }
                else
                {
                    memcpy(temp_gene, tab + temp_itr + 1, itr - temp_itr - 1);
                }
    
    
                temp_gene[itr - temp_itr] = '\0'; //end with null character
                printf("split wise is %s\n", temp_gene);
                temp_itr = itr;
            }
            temp_gene[0] = '\0'; //set first char null
        }
    
    
        temp_gene[0] = '\0'; //set first char null
        
        if (temp_itr == 0)
        {
            //if no comma or ; has been found 
            memcpy(temp_gene, tab + temp_itr, str_len - temp_itr);
            temp_gene[str_len - temp_itr] = '\0'; //end with null character
        }
        else
        {
            //if we already had temp_itr initiated and string has , or ;
            memcpy(temp_gene, tab + temp_itr + 1, str_len - temp_itr);
            temp_gene[str_len - temp_itr] = '\0'; //end with null character
        }
    
    
        printf("final split wise is %s\n", temp_gene);
    
    
        return 0;
    }
    I find this code a little make-shift.

    I would like to improve this than have multiple if checks. For instance, if (temp_itr == 0) in the end, if no comma or ; is found.

    Thank you.
    Last edited by deathmetal; 05-10-2021 at 07:51 AM.

  2. #2
    null pointer Structure's Avatar
    Join Date
    May 2019
    Posts
    338

    Post

    GENE1;GENE2
    GENE21,GENE22
    GENE12
    I need name of each gene separately: GENE1, GENE2 and such. It is possible that there is no comma or ; in the string for instance, in GENE12.
    Code:
    #include <stdio.h>
    #include <string.h>
    
    char gene[128][256], genecount = 0;
    
    void splitgenes( char *string ) {    
      int which = 0, count = 0;
      for (int i=0; i<strlen(string);i++) {
        if ( string[i] == ';' || string[i] == ',' || string[i] == '\n' ) {
          gene[which][count++] = '\0';
          which ++; count = 0;
        } else {
          gene[which][count++] = string[i];
        }
      } 
      gene[which][count++] = '\0';
      genecount = which;
    }
    
    int main( int argc, char *argv[] ) {      
      
        char genestring[] = "GENE1;GENE2\nGENE21,GENE22\nGENE12";
    
    
        splitgenes( genestring );    
    
    
        for (int i=0; i<=genecount; i++) {
            printf( "%s, ", gene[i] );
        }
    
        printf( "\n" );
        return 0;
    }
    Last edited by Structure; 05-10-2021 at 09:15 AM.
    "without goto we would be wtf'd"

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > strtok isn't helpful in this kind of situation that is why I worked on this piece of code.
    How so?

    You can pass multiple tokens to strtok.
    Code:
    #include <stdio.h>
    #include <string.h>
    int main ( ) {
        char tab[] = "hello;morgan;chase;capital,house";
        const char *sep = ";,";
        for ( char *p = strtok(tab,sep) ; p != NULL ; p = strtok(NULL,sep) ) {
            printf("Got %s\n", p);
        }
    }
    C++ Shell

    Got hello
    Got morgan
    Got chase
    Got capital
    Got house
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Oct 2013
    Posts
    87
    Quote Originally Posted by Salem View Post
    > strtok isn't helpful in this kind of situation that is why I worked on this piece of code.
    How so?

    You can pass multiple tokens to strtok.
    C++ Shell

    Got hello
    Got morgan
    Got chase
    Got capital
    Got house
    Thank you. I don't know why this wasn't splitting string on my end.
    It works with below string as well with no comma or ;

    Code:
    char tab[] = "house";
    perfect.

  5. #5
    Registered User
    Join Date
    Apr 2021
    Posts
    140
    What are the valid characters that make up a "GENE", exactly? Is it just "ACTG", are "actg" allowed? Are there other characters?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help me to optimize this code
    By Fn00 in forum C Programming
    Replies: 12
    Last Post: 09-27-2015, 04:28 PM
  2. How to optimize a program for multiple processsing
    By acho.arnold in forum C Programming
    Replies: 4
    Last Post: 07-08-2013, 09:57 AM
  3. Replies: 7
    Last Post: 10-01-2010, 04:09 PM
  4. String tokenizer and delimiters
    By John_L in forum C Programming
    Replies: 5
    Last Post: 11-06-2007, 07:22 PM
  5. strtok and string delimiters
    By Leonardo in forum C Programming
    Replies: 1
    Last Post: 05-01-2003, 04:28 PM

Tags for this Thread